home *** CD-ROM | disk | FTP | other *** search
Text File | 1986-12-09 | 62.0 KB | 1,387 lines |
-
-
-
-
-
-
-
- TEXTCON (Version 1.3)
- A Program for Conversion of ASCII Files
- Between Word Processors
-
- Chris Wolf
-
- November 30, 1986
-
-
- PURPOSE:
-
-
- Virtually all word processors will import ASCII files, but
- anyone who has tried it knows that the results are often
- less than optimal. Documents that are transferred this way
- almost always require a great deal of manual "cleaning up"
- to get them into the desired format.
-
- TEXTCON is a file "pre-processor" for MSDOS computers that
- does much of the necessary conversion automatically. The
- ASCII files that it produces are in a form more suitable for
- importation to most word processors. TEXTCON will not
- eliminate all manual editing, but it makes the job much
- easier.
-
- TEXTCON has tremendous power and flexibility that can be
- useful for tasks involving data base files, desktop
- publishing, and program editing. TEXTCON users have found
- the program helpful for many kinds of file manipulations,
- such as adding line feeds where only carriage returns are
- present, expanding tabs to spaces, removing all blank lines
- from a file, etc.
-
- (. . . caution . . . advertisement follows . . .)
-
- And now, those who contribute $25 or more for TEXTCON will
- be sent TEXTDCA, which has all the features of TEXTCON, but
- will also write files in IBM DCA/RFT format. This allows
- ASCII or WordStar files to be imported to your word
- processor with spacing characteristics such as margins,
- indents, centering, tab stops, etc. intact.
-
- If you use a word processor that accepts DCA/RFT format
- files (this includes Word Perfect, Microsoft Word, IBM
- Displaywrite, MultiMate, Volkswriter 3, WordStar 2000, and
- many others), TEXTDCA is simply the best program available
- for importing ASCII files.
-
- This can save you a great deal of time that would otherwise
- be spent reformatting the imported file. An additional
-
-
-
-
-
-
-
-
-
-
-
-
- TEXTCON File Converter
-
-
- feature of TEXTDCA is a menu-driven mode (on PC-compatibles
- only) which simplifies the selection of processing options.
-
- (. . . end of advertisement . . .)
-
- It's convenient to divide the functions performed by TEXTCON
- into five main categories:
-
- 1. Removing carriage returns
-
- The most common problem when importing ASCII files into
- word processors is that each line from the original file
- will end in a "hard" carriage return. In most cases
- these have to be removed manually in order to get the
- document formatted properly on the new word processor.
- TEXTCON uses a sophisticated algorithm to determine
- which sections of text constitute "paragraphs", and then
- it removes all carriage returns except those at the ends
- of paragraphs. (For this purpose, a paragraph is
- defined as a block of text where it is desirable for
- words to wrap to following or previous lines when edit-
- ing or formatting changes are made.)
-
- TEXTCON can cope with almost any paragraph format
- including difficult ones like fully indented (nested),
- hanging indent, outline style, etc. It does not depend
- on double spacing or first-line indentation, although
- these are recognized. It will handle print-formatted
- files (i.e., those having a left margin of blanks), as
- well as the totally unformatted files used as input to
- formatters like NROFF (as long as they use "dot"
- commands). It is designed to recognize header lines,
- tables, etc. and will avoid reformatting them.
-
- (Of course, document formats vary widely, and it really
- takes human intelligence to recognize paragraph breaks
- with 100% accuracy. TEXTCON will occasionally make
- mistakes when dealing with particularly tricky formats.)
-
- 2. Adding carriage returns
-
- TEXTCON will also do the opposite process if you wish,
- adding carriage returns to files that have them only at
- the ends of paragraphs. Or it can deal with special
- file formats by substituting carriage returns for some
- other special character that is used to represent a
- paragraph end.
-
- 3. Removing blanks
-
- An ASCII file may have extraneous blanks that cause
- problems if they are imported to a word processor.
- There may be blanks at the beginnings of lines for a
- left margin or for indented or nested text. There may
-
-
-
- 2
-
-
-
-
-
- TEXTCON File Converter
-
-
- be extra blanks within lines for justified text or
- between columns of a table (where tabs are more
- desirable). TEXTCON removes extraneous blanks and,
- where appropriate, replaces them with tabs, thus saving
- manual editing time.
-
- 4. Removing extraneous lines
-
- Some ASCII files have extraneous lines that TEXTCON will
- remove for you. Print-formatted files, for example,
- sometimes have additional lines inserted solely for
- underlining or boldface. TEXTCON will remove these so
- you don't have to. TEXTCON recognizes double-spaced
- files and converts them to single spacing. Lines that
- consist solely of "dot" commands (like WordStar's .PA)
- are converted to blank lines. You can also remove (or
- add) lines by setting the spacing between paragraphs to
- any specific number of blank lines you wish.
-
- Consecutive runs of more than two blank lines are
- reduced to two, which may help with files that have been
- formatted for a printer. TEXTCON also tries to
- recognize page breaks and eliminate all blank lines
- between pages if possible. The situation is more
- complicated if the file contains headers or footers, but
- there is now an option which can, in certain cases,
- remove these as well.
-
- 5. Removing or converting characters
-
- TEXTCON translates all characters in WordStar files to
- their ASCII equivalents. It also removes all non-
- printing ASCII characters (except tabs) unless you ask
- that they be kept. It has three optional methods for
- dealing with line-ending hyphens. TEXTCON does not
- alter or remove the IBM extended ASCII characters, used
- for math symbols, letters from foreign alphabets, etc.
-
- TEXTCON was designed to be as automatic as possible in its
- operation so it can be used by someone with very little
- knowledge about the files being converted. Although it has
- many options for specialized kinds of conversions, it will
- work very well on a wide variety of files without the use of
- any of the options.
-
- The options are described in a later section. If TEXTCON
- doesn't seem to work quite as you want it to, you may want
- to read those descriptions. They include more detailed
- information about the kinds of changes TEXTCON makes to a
- file and how you can control these changes.
-
- TEXTCON is useful for importing text to many microcomputer
- word processors, including Microsoft Word, Word Perfect, and
- WordStar, as well as some office automation systems, such as
-
-
-
- 3
-
-
-
-
-
- TEXTCON File Converter
-
-
- NBI. If a word processor exhibits problems with "hard
- carriage returns" when you import ASCII files, then the
- chances are that TEXTCON will help.
-
- Some PC word processors, including Volkswriter, MultiMate,
- and PC-Write, actually require the hard returns, and have
- trouble with files that do not include them. TEXTCON can
- add carriage returns to files so that these word processors
- can import them successfully.
-
- If you are working with a word processor that will accept
- DCA/RFT format files, the TEXTDCA version of TEXTCON offers
- an even higher level of performance than TEXTCON itself.
-
- I am making TEXTCON available for distribution without
- charge, but I hope that anyone who uses it on a regular
- basis will make a monetary contribution toward the time I
- put into developing and supporting it.
-
-
-
- USE:
-
-
- To run the program, use the command form:
- TEXTCON [options] infile outfile
- where the available options are described in the following
- section. (Again, TEXTCON will handle most conversions very
- well with no options specified, so most users can ignore all
- mention of options here.) The file names can include the
- disk-drive identifier and a path name, if appropriate. A
- typical command with no options would be as follows:
- TEXTCON A:PROPOSAL B:PROPOSAL.TXT
- The options are identified by a preceding hyphen as a flag
- character, so a command with options might look as follows:
- TEXTCON -T5 -B TEXT.DOC B:TEXT.ASC
-
- Multiple options can be combined, using a single hyphen, to
- appear as follows:
- TEXTCON -T5B TEXT C:\DOCS\TEXT.OUT
- You must be careful when combining options this way,
- especially if you are using options with numeric "names" or
- those with sub-options. For example, if you wanted to use
- the options -T3 -2 -KC -B, and you combined them as -T32KCB,
- this would be interpreted as -T32 -KCB, which is very
- different than you intended. If, instead, you combined them
- as -2BT3KC, they would be interpreted correctly. If there
- is any question in your mind about this, keep all of the
- options separate on the command line. (The menu system in
- TEXTDCA simplifies this quite a bit.)
-
- If you specify an illegal option, such as -Q, the program
- will display the legal options.
-
-
-
-
- 4
-
-
-
-
-
- TEXTCON File Converter
-
-
- The file specified as "infile" must be an ASCII file or a
- WordStar file; TEXTCON will not work on an internal word
- processor file. Some word processors, including PC-Write
- and Volkswriter always keep their text in ASCII files. For
- other word processors, such as MultiMate, Word Perfect, or
- Microsoft Word, you will have to create an ASCII copy of
- your file before TEXTCON will work with it. If you try to
- convert an internal file, you may not get an error message
- from TEXTCON, but when you load the converted file into
- another word processor, it will probably contain gibberish.
-
-
-
- OPTIONS:
-
-
- Before converting a file, TEXTCON analyzes the initial
- portion to determine certain overall characteristics of the
- document. During the conversion, the program applies a
- complex set of rules on a line-by-line and character-by-
- character basis to determine localized formatting
- information. Because of this, the optional parameters
- described here are not usually needed. In any case, you
- should certainly try a few conversions before using any of
- these options. Unless you notice problems or are simply
- curious about the options, you can ignore the following
- section.
-
- IMPORTANT: The letters used to select options have changed
- substantially in Version 1.3. If you have used options in
- earlier versions, be sure to check the new descriptions
- below carefully, or at least read the "HISTORY" section at
- the end of this document for a summary of the changes. If
- you use the old option commands, you may get very strange
- results because of their new meanings.
-
- The following describes each of the conversion options
- available in the program. Note that some of them are inter-
- related or similar in function. As a conceptual aid, they
- are organized into two groups. Those in the first group are
- descriptive of the input file format. If you can provide
- this additional information to TEXTCON, it can do a better
- job of conversion. The options in the second group describe
- certain types of processing that you want TEXTCON to
- perform. These options have a direct effect on the format
- of the output file.
-
- The options are shown in upper case, but lower case is
- acceptable as well.
-
-
-
-
-
-
-
-
- 5
-
-
-
-
-
- TEXTCON File Converter
-
-
- INPUT FORMAT DESCRIPTORS
-
- 1. -1, -2
-
- TEXTCON is designed to recognize the line spacing
- (single or double) used in a file, but in some rare
- cases it will make a mistake. This will often happen
- when the initial part of the document (the part that
- TEXTCON analyzes before starting the conversion) has
- different spacing than the rest. When TEXTCON finishes
- its analysis of a file, it displays on the screen what
- it determined the spacing to be. If this is wrong, you
- will have to use the -1 or -2 option to specify that the
- input file is single- or double-spaced.
-
- You can also detect an improper spacing decision from
- problems in the output file. The usual symptom is that
- the converted file either will contain many hard car-
- riage returns and be double-spaced, or will have many
- paragraphs run together.
-
- If TEXTCON's double-space option is in effect, either
- through its own decision or because you specified it,
- single occurrences of blank lines are totally ignored,
- as if they simply were not in the file. Two consecutive
- blank lines are treated as if there were only a single
- blank line. Occasionally you may find that this causes
- some paragraphs to run together in the converted file.
- This would be most likely to happen if single and double
- spacing are mixed in the same document, although
- normally TEXTCON will handle this correctly.
-
-
- 2. -B
-
- This option tells TEXTCON that your file has only block-
- style paragraphs, i.e., there are no paragraphs with
- first-line indents or outdents. TEXTCON doesn't need to
- know this in order to process a file, but there are some
- cases where it can do a better job if it does. This
- should be thought of as a little "tweak" for those who
- want the absolute best performance. If you use it for a
- file that contains non-blocked paragraphs, of course,
- performance will be worse.
-
-
- 3. -M#
-
- TEXTCON automatically determines the size of the
- document's left margin, but again, it may make a mistake
- if the margin becomes smaller toward the end of the
- document. If this happens, the conversion will stop at
- that point with an error message. It will tell you what
- the new, smaller margin value is, and instruct you to
-
-
-
- 6
-
-
-
-
-
- TEXTCON File Converter
-
-
- rerun TEXTCON using the -M option with that value. This
- is the only case where you should need to use this
- option.
-
-
- 4. -F#, -H#
-
- These are two of the trickier options in TEXTCON, and
- should be used with caution. Their purpose is to remove
- running headers and footers from page-formatted files,
- so they don't wind up intermingled with the text. They
- have the potential to save a lot of manual editing time
- on some files, but they can mistakenly remove text lines
- instead. Of course, the original file is not modified
- in any case, so if it doesn't work correctly you can
- rerun TEXTCON without these options.
-
- The numeric parameter used with these options is the
- number of the line on each page that contains the header
- or footer. If you don't want to figure this out
- yourself, you can omit the number or use a value of
- zero, and TEXTCON will try to determine which line(s)
- contain the header and/or footer. Thus, -H3 -F64 would
- ask TEXTCON to remove the third and sixty-fourth lines
- of each page and attempt to join the text across page
- boundaries. -F by itself, on the other hand, would
- imply there was no running header and that TEXTCON
- should determine which line number appears to be a
- footer.
-
- These options depend on a number of assumptions:
- o that your document either has exactly 66 lines per
- page, or it has less than 66 lines per page and uses
- form feed characters to go to a new page (Note that
- if a file has extra lines without linefeeds for the
- purpose of underlining or boldface, these will be
- stripped, and don't count towards the 66 lines per
- page.),
- o that the header or footer is only one line long,
- o that the header or footer always appears on the same
- line of every page
- o if you do not specify the line number(s), a running
- header and/or footer must occur within the first two
- pages of the file
-
- If a file meets these criteria, TEXTCON will remove the
- desired lines, usually even combining paragraphs across
- page boundaries. If a file diverges slightly from that
- description, TEXTCON may erroneously delete text lines
- from the file. The best advice is to examine closely
- any file that has been created using this option.
-
-
-
-
-
-
- 7
-
-
-
-
-
- TEXTCON File Converter
-
-
- 5. -W
-
- TEXTCON recognizes WordStar files automatically and
- processes them accordingly. When doing this, TEXTCON
- assumes that the writer used WordStar "correctly",
- taking advantage of all of its formatting abilities.
-
- Unfortunately, many writers use a word processor as if
- it were simply a correctable typewriter. This may
- include, among other bad habits, using the space bar to
- align text or to "nest" paragraphs. TEXTCON will not
- perform very well on this type of file, because it is
- neither a straight ASCII file nor a true WordStar file.
-
- The -W option tells TEXTCON to treat the input file as a
- "semi-formatted" WordStar file, thus correcting for
- these sloppy typing habits. If you don't know how to
- recognize a poorly done WordStar file, try the
- conversion both with and without the -W option and
- compare the results. Most WordStar files that I have
- tried converted better with the use of this option than
- without it.
-
- (For the technically minded, the -W option tells TEXTCON
- to convert all soft spaces and soft carriage returns to
- hard spaces and hard returns in order to determine the
- intended formatting of the file. TEXTCON then strips
- out any of the spaces and carriage returns that it
- determines are not needed. The most common undesired
- side-effect of this is that TEXTCON will occasionally
- make a wrong paragraphing decision.)
-
-
- 6. -X, -Y
-
- As described under "POSSIBLE PROBLEMS", below, line-
- ending hyphens are normally preserved and a space is
- inserted after them, so that you can find each one and
- make a decision as to whether it needs to be kept in the
- document. If you already know that all hyphens are
- required hyphens or that all of them are "soft" hyphens,
- you can save some editing time by using the -X or -Y
- options.
-
- The -X option indicates that all line-ending hyphens are
- required hyphens. TEXTCON will leave them in the text
- and will not insert a blank. This is useful if you know
- that no "soft hyphenation" has been performed on the
- file.
-
- The -Y option indicates that all line-ending hyphens are
- "soft" hyphens, and that TEXTCON should remove them
- entirely. This is not a very useful option, because it
-
-
-
-
- 8
-
-
-
-
-
- TEXTCON File Converter
-
-
- would be a rare document that you could safely assume
- had no line-ending required hyphens.
-
-
- 7. -Z#
-
- This is a very specialized option that would not often
- be used on standard document files. It allows you to
- specify an alternative character that marks the ends of
- "paragraphs" in your file.
-
- The character is specified by means of its decimal ASCII
- code, so for example, -Z14 would look for a Ctrl-N to
- mark the ends of paragraphs, -Z35 would look for the
- symbol #, and -Z236 would look for the infinity symbol
- Ï. The only ASCII values not allowed are 0 and 255.
-
- When this option is used, TEXTCON will do two things
- differently:
- a. treat all carriage returns as soft returns, removing
- them from the file, and
- b. treat all occurrences of the specified character as
- hard returns, removing them from the file and
- substituting a carriage-return/line-feed pair.
-
- This option can be extremely useful for certain types of
- file transfers, particularly those involving databases,
- certain desktop publishing applications, and
- manipulations of bulletin board message files
-
-
-
- SPECIAL PROCESSING DESCRIPTORS
-
- 1. -K<sub-options>
-
- As mentioned earlier, one of TEXTCON's major jobs is to
- remove certain unneeded elements from your file. In
- some cases you may want some of these elements to be
- kept; the -K option allows this.
-
- The -K option is a bit different from the other options
- in the way it is specified. It has several "sub-
- options" represented by additional key letters, which
- must immediately follow the -K. If, for example, you
- wanted only the S sub-option, the full option descriptor
- would be -KS, whereas if you wanted all of the sub-
- options, you would use -KSCRB. (You may also use the
- full option more than once on the command line, so -KS
- -KC -KR -KB would also invoke all of the sub-options.)
-
- The "Keep" sub-options are as follows:
-
-
-
-
-
- 9
-
-
-
-
-
- TEXTCON File Converter
-
-
- a. S sub-option
-
- The S sub-option of Keep instructs TEXTCON to keep
- all spaces in the converted file.
-
- In addition to the substitution of tabs for multiple
- spaces (described under the -T# option below),
- TEXTCON replaces any set of two or more spaces with
- a single space unless it is at the end of a
- sentence. At the end of a sentence, it replaces
- three or more spaces with two. This helps with
- files that have had spaces added to justify the
- right margin. TEXTCON also removes all leading and
- trailing spaces from each line it processes.
-
- In some special cases this processing may be
- undesirable. The S sub-option of Keep overrides
- both the substitution of tabs for multiple spaces
- and the deletion of spaces, so that all spaces are
- kept as found in the original file. You would not
- normally want to use this option for a file with a
- left margin of spaces, because those spaces would be
- incorporated into the paragraphs of text.
-
- b. B sub-option
-
- The B sub-option of Keep instructs TEXTCON to keep
- all blank lines (except those within double-spaced
- paragraphs) in the converted file.
-
- Normally, if TEXTCON encounters more than two
- consecutive blank lines (or four in a double-spaced
- document) it removes the "extra" ones (in either
- case, leaving only two in the converted document).
-
- It also tries to recognize print-image files, i.e.
- ones that contain the actual page breaks in the form
- of multiple blank lines or form-feed characters at
- the end of one page and beginning of the next. If
- it does recognize this, it will remove the page
- break entirely and will reconstruct a paragraph
- broken between the pages. When TEXTCON's analysis
- detects this type of format, it prints a message
- describing the file as "page-formatted".
-
- The B sub-option of Keep overrides this blank-line
- stripping, so that all blank lines are kept in the
- file.
-
- c. R sub-option
-
- The R sub-option of Keep instructs TEXTCON to keep
- all carriage returns in the converted file.
-
-
-
-
- 10
-
-
-
-
-
- TEXTCON File Converter
-
-
- Some word processors (including WordStar, Microsoft
- Word, SuperWriter, and WordVision) create files that
- do not have carriage returns at the ends of lines,
- but only at the ends of paragraphs. This greatly
- simplifies the job that TEXTCON has to do. TEXTCON
- will normally recognize these files, and display the
- message "All carriage returns will be preserved."
- If it does not recognize such a file, the usual
- symptom is that the converted file frequently has
- what should be separate paragraphs combined into one
- paragraph. In this case you will need to use the R
- sub-option of Keep.
-
- This is needed very rarely however. The most common
- use that I have found for this sub-option is to take
- advantage of some of TEXTCON's other features, such
- as tab insertion or double-to-single-spacing
- conversion, without its carriage-return stripping.
-
- Note that the R sub-option does not affect blank
- lines. These are still stripped from the file
- according to the rules explained above. If you want
- to keep all lines intact you must use both the R and
- B sub-options.
-
- d. C sub-option
-
- The C sub-option of Keep instructs TEXTCON to keep
- all control codes (ASCII characters between 1 and
- 31.
-
- TEXTCON normally strips all control codes, with the
- exception of tab characters. If you want control
- codes kept, use the C sub-option.
-
-
- 2. -T#
-
- TEXTCON was designed primarily for importing files to
- the more sophisticated word processors, where documents
- are often printed with proportional spacing. For this
- kind of work, tabs are used extensively to position
- items in a document; multiple spaces will not work
- correctly. For this reason, TEXTCON preserves tabs
- rather than expanding them with blanks. In some cases,
- multiple blanks are preferable, so I have provided an
- option for this.
-
- The -T option requires a numeric value (e.g., -T4 or
- -T0), specifying the number of spaces between tab stops.
- The first tab stop is always at column one. When a tab
- is found, enough spaces are substituted in the converted
- file to position the following character at the next tab
- stop. The default, of course, (if the -T option is not
-
-
-
- 11
-
-
-
-
-
- TEXTCON File Converter
-
-
- specified at all) is that tabs are preserved, whereas a
- value of zero (-T0) means they are removed entirely.
-
- If the -T option is used, TEXTCON's normal behavior of
- substituting tabs for multiple spaces is turned off
- also. This substitution is normally done in three
- circumstances: at the beginning of a paragraph whose
- first line is indented; between items in a columnar
- table; and between a list-identifying number, letter, or
- other symbol and the corresponding list entry (for
- example, the item "3. -I#" just below).
-
- This means that if you have a file that does not contain
- tabs, and you simply want to suppress TEXTCON's
- substitution of tabs for spaces, you can use -T with any
- numeric value to accomplish this. The number you use
- doesn't really matter here, since it is used only to
- determine the number of spaces to substitute when a tab
- is found in the original file.
-
-
- 3. -I#
-
- As described under the -T# option, TEXTCON normally
- substitutes a tab character for multiple spaces at the
- beginning of indented paragraphs. The -I# option allows
- you to use a specific number of spaces instead, or to
- convert indented paragraphs to block-style paragraphs.
-
- This option requires a numeric value indicating how many
- spaces are to be used for indentation. If, for example,
- you specify -I5, all indented paragraphs in the
- converted file will have a first-line indentation of
- five spaces. Using -I0 will convert indented paragraphs
- to block-style paragraphs (zero indentation). The -I#
- parameter has no effect at all on paragraphs that are
- already block-style or have hanging indents.
-
-
- 4. -P#
-
- TEXTCON normally leaves paragraphs spaced the same way
- they are spaced in the original file. The usual style
- for single-spaced documents has one blank line between
- paragraphs; double-spaced documents usually have no
- extra blank lines. If your original document has one
- kind of line spacing and you want to print the new
- document with different spacing, you may find that the
- paragraph spacing is either too large or too small.
-
- The -P# option lets you change that spacing. For
- example, -P0 will eliminate any extra blank lines
- between paragraphs, so you might use it if your original
- file was single-spaced and you wanted to print the new
-
-
-
- 12
-
-
-
-
-
- TEXTCON File Converter
-
-
- copy double-spaced. -P1 will end each paragraph with
- exactly one blank line, so you might use it for the
- opposite case.
-
- The -P# parameter has no effect on paragraphs that
- consist of a single line of text; those are assumed to
- be lists or tables whose spacing should be preserved.
-
-
- 5. -L#
-
- TEXTCON automatically determines a "typical" line length
- for your document and from this calculates a "cutoff"
- length used in its paragraph-determination algorithms.
- If a line is shorter than the cutoff length, TEXTCON
- assumes that the carriage return at the end of that line
- was put there intentionally, and the program will not
- delete it.
-
- You can use the -L# option to override TEXTCON and
- specify your own cutoff length. Note that the length of
- a line is not measured from the very beginning of the
- line (column 1), nor is it measured from the first non-
- blank character on that particular line. The length is
- measured starting at the left margin of the document,
- which is determined by the leftmost non-blank character
- found anywhere in the document. If, for example, the
- left margin of the document were 10 characters (meaning
- the leftmost character in any line occurred in position
- 11) and the cutoff length were 30, a line with 15
- leading spaces followed by 20 characters would have a
- length of 15+20-10 = 25, and so would be shorter than
- the cutoff length.
-
-
- 6. -S#
-
- The -S# option is useful for word processors such as
- Volkswriter and PC-Write which require carriage returns
- at the end of each line rather than at the end of each
- paragraph. It tells TEXTCON to split each paragraph
- into lines of a particular length, given by the numeric
- parameter. For example, -S65 says that the output file
- should contain lines that are approximately 65
- characters long.
-
- When you use this option TEXTCON splits lines at the
- first space following the specified length. This means
- that the lines in the file will, on average, be one word
- longer than the length you specify, and some of them may
- be as much as 10 or 15 characters longer.
-
- This option will only work on files that have very long
- lines, that is, those files where TEXTCON will keep all
-
-
-
- 13
-
-
-
-
-
- TEXTCON File Converter
-
-
- carriage returns. It will not, for example, allow you
- to take a file with paragraphs made up of 80 character
- lines and reformat those into paragraphs of 60 character
- lines. That would require it to remove some carriage
- returns and add others, which it cannot currently do.
- Almost any word processor should be capable of that kind
- of reformatting.
-
-
-
- POSSIBLE PROBLEMS:
-
-
- Many of TEXTCON's decisions are based on its analysis of the
- beginning of your input file. It analyzes approximately two
- pages of text, but this will vary from file to file. If
- your file has sections that are very distinct in formatting,
- the parameters that TEXTCON determines from the beginning of
- your file may not be accurate for the rest of the file. In
- these cases, TEXTCON will perform better if you subdivide
- the input file and process each distinctly formatted section
- separately.
-
- Words from the original file that are hyphenated at the end
- of a line will remain hyphenated, and an extra space will be
- inserted following the hyphen. For example, the word ex-
- ample will be converted to ex- ample. You can find these
- and convert them fairly easily by searching for "- " (a
- hyphen followed by a blank). The program could have been
- designed to remove hyphens at the ends of lines, but then it
- would also have removed required hyphens, as in ex-
- president. You may want to use the -X or -Y options to
- change this behavior.
-
- When a converted file is loaded into the new word processor,
- tables may have their columns too close together or too far
- apart. This is because TEXTCON puts tab characters into
- tables, but it cannot set the positions for the tab stops.
- As soon as you set the tab stops where you want them, the
- columns will line up correctly. The TEXTDCA version of
- TEXTCON can also preserve the settings of the tab stops,
- thus saving some additional time.
-
- Sometimes TEXTCON will fail to remove the carriage returns
- within a nested or fully-indented paragraph. A common
- reason for this is that the person who created it started
- each line with a tab, rather than using an indent command.
- You can get around this by using the -T# option with some
- suitable tab value (usually 5 is a good choice).
-
- This problem will also occur if the paragraph is indented a
- large amount from the right margin, making the lines shorter
- than the cutoff length. Correct this with the -L# option,
- using a numeric value that is less than the shortest line.
-
-
-
- 14
-
-
-
-
-
- TEXTCON File Converter
-
-
- Be sure to take into account the document margin when
- calculating this number.
-
- The program is written in C, using the DeSmet C compiler. I
- have tested it only on IBM PC-compatibles, but it should
- work on other MSDOS machines. I have heard from one person
- who has used it successfully on a DEC Rainbow.
-
-
-
- OTHER USES FOR TEXTCON:
-
-
- TEXTCON users have found some ingenious ways to use the
- program - tasks for which the program was not intended, but
- which it does quite well. The following examples may
- suggest some additional ways TEXTCON can aid in your text
- processing work.
-
- 1. Use of the Keep Option
-
- TEXTCON's -K option figures prominently in most of these
- special uses. If you use -K with all of its sub-options
- (-KBCRS), the output file will be identical to the input
- file, with a few exceptions. This would seem to be a
- pointless thing to do, unless, of course, those
- exceptions are important to you. They are as follows:
- a. If the input file has lines that end with only a
- carriage return, TEXTCON will add a line feed to
- each of them. You may occasionally get files of
- this type, from certain programs or from other
- computers, and you may find that your word processor
- will not accept them without the line feeds.
- b. If the input file has no detectable formatting,
- TEXTCON assumes it was intended for a print
- formatter. In this case, TEXTCON will remove "dot"
- commands from the file.
- c. TEXTCON deletes trailing blanks from each line.
- d. WordStar files are always converted to ASCII.
-
- Each of these conversions can be extremely useful for
- certain kinds of files, even when you don't need the
- carriage-return stripping that is TEXTCON's main
- purpose.
-
-
- 2. Adding Carriage Returns
-
- You may sometimes get files from another computer where
- a line-feed character, rather than a carriage return, is
- used to mark the ends of lines. This causes great
- difficulty for some PCDOS software.
-
-
-
-
-
- 15
-
-
-
-
-
- TEXTCON File Converter
-
-
- TEXTCON can convert these files by use of the -Z#
- option. The decimal ASCII code for line feed is 10, so
- the full option would be -Z10. You may also want to use
- -KBCS to keep other characteristics of the file intact.
- The -Z# option overrides the -KR option.
-
-
- 3. Removing Blank Lines
-
- TEXTCON removes multiple blank lines by default, but
- leaves up to two blank lines separating paragraphs. If
- you want to remove all blank lines from a file, use the
- -P0 option. One TEXTCON user needed a count of only the
- non-blank lines in an ASCII file, but couldn't find a
- counting program that would do that. Using TEXTCON with
- -P0 and -KR produced a file with only the blank lines
- removed.
-
-
- 4. Tab Expansion
-
- For certain programs and certain applications it may be
- inconvenient to have tabs in a file. TEXTCON can remove
- them and expand them to spaces via the -T# option. If
- you use this along with the -KBCRS option, the output
- file will be nearly identical to the input file, but
- with tabs exapnded to spaces.
-
- This option can also be useful when dealing with badly
- formatted files. Some people create fully indented
- paragraphs by inserting a tab at the beginning of every
- line of the paragraph rather than by using their word
- processor's indent function. This creates a mess if you
- have to edit those paragraphs or move them to another
- word processor. TEXTCON will interpret them as
- individual lines, not as paragraphs. However, if you
- use the -T# option, TEXTCON will correctly recognize
- them as fully indented paragraphs.
-
-
-
- GUIDELINES FOR SPECIFIC WORD PROCESSORS:
-
-
- The following gives some guidelines for preparing documents
- for an ASCII file transfer with five different word proces-
- sors. It is written with the assumption that you will be
- transferring the files to the NBI Oasys 64 system. However,
- almost all of the advice also applies to transfers into
- WordStar, Word Perfect, and Microsoft Word.
-
- There are certain restrictions imposed by the ASCII transfer
- process that apply to all word processors:
-
-
-
-
- 16
-
-
-
-
-
- TEXTCON File Converter
-
-
- 1) All character formatting, such as boldface, underline,
- subscripting, and superscripting, is lost. The text
- that it applies to remains intact, but the formatting
- must be redone on the target word processor.
-
- 2) All spacing characteristics, such as margins, inden-
- tation, centering, and justification are lost. All
- paragraphs, including nested ones, will be moved to
- start at the left margin. Paragraphs with a first-line
- indent (i.e., non-block style) will have a tab inserted
- for this purpose.
-
-
- Microsoft Word
-
- Microsoft Word is probably more compatible with the NBI sys-
- tem than is any other word processor. There are only two
- rules to remember when using Word. Use tabs rather than
- multiple spaces in your original document for maximum
- flexibility in reformatting. Don't use the "newline" key
- (Shift-Enter), because it does not appear in an ASCII file
- as a line-ending character.
-
- Characteristics that you assign to your document using the
- Format command (Character, Paragraph, or Division) will be
- lost in the transfer. Footnotes entered using Format
- Footnote will be transferred, but they will all be grouped
- together at the end of the file and there will be no indi-
- cation of where they are referenced.
-
- To save your file in ASCII form, you must use the Transfer
- Save command with the "Formatted" option set to "No". Be
- sure to give the file a different name than you have been
- using for your document, or the ASCII file will replace your
- formatted document file. (If you make this mistake, you
- should immediately exit Word and copy the .BAK file back to
- the .DOC file and then try the save again.)
-
-
- Word Perfect
-
- Use tabs rather than multiple spaces in your original
- document for maximum flexibility in reformatting.
-
- Footnotes and endnotes will not be included in the ASCII
- file; they will have to be retyped on the NBI.
-
- You should use the Text In/Out command (Extended Features -
- Prepare/Protect in version 3) to save your file in ASCII
- form. Do not use the Print command to create this file.
- When you use the Text In/Out command, do not use the same
- file name that you use for your document, or the ASCII file
- will replace your formatted document file and there will be
- no way to get it back.
-
-
-
- 17
-
-
-
-
-
- TEXTCON File Converter
-
-
-
- Version 4.2 of Word Perfect has a new, additional method of
- file saving available under the Text In/Out command. This
- creates an ASCII file with only the paragraph-ending
- carriage returns included. If you create an ASCII file
- using this option, there is no need to use TEXTCON before
- importing the file to another word processor, unless you
- want to do some special reformatting such as expanding tabs,
- changing paragraph spacing, etc.
-
-
- PC-Write and Volkswriter
-
- Files from these two word processors can be the most diffi-
- cult ones to transfer, because their format is so dependent
- on the style of the particular writer. Fortunately,
- TEXTCON's paragraph-recognition algorithms really shine when
- working with files from these word processors, so there are
- relatively few rules for you to follow.
-
- Turn the Justify option off, so your text is not right-
- justified. TEXTCON will take out extra blanks that are
- inserted for justification, but it occasionally makes
- mistakes.
-
- PC-Write and Volkswriter convert all tabs to spaces, which
- means that tables may not transfer well. TEXTCON tries to
- put tabs back in where they are needed, but will not always
- do this correctly.
-
- Footnotes will be transferred to the NBI, but they will
- appear in the middle of the text that references them.
-
- PC-Write and Volkswriter files are always stored in ASCII
- form, so you don't have to do any special type of save
- before transferring the file.
-
-
- WordStar
-
- WordStar files can be converted very effectively by TEXTCON.
- The only limitation is that tables may not transfer very
- well, because WordStar converts all tabs to spaces. TEXTCON
- tries to put tabs back in where needed, but will not always
- do this correctly.
-
- WordStar files can be read directly by TEXTCON; you do not
- have to do a print to disk. Be sure to try the -W option
- when using TEXTCON on WordStar files. It is not required,
- but it generally gives a better translation.
-
-
-
-
-
-
-
- 18
-
-
-
-
-
- TEXTCON File Converter
-
-
- HISTORY:
-
-
- The original version of this program was written to assist
- in the transfer of documents from microcomputer word proces-
- sors and optical scanners to an NBI Oasys office system.
- Because of the wide variety of word processors used by the
- people involved, it wasn't practical to try to accommodate
- all the different internal file formats. Instead, the
- program was designed to reformat standard ASCII files into a
- form that could be imported easily into another word
- processor. It was very important that the program properly
- process as many different varieties of ASCII file formats
- and writing styles (indentations, paragraphing, line
- spacing, etc.) as possible.
-
- The original program was quite complicated to use and the
- algorithm employed was quite simple-minded, so that certain
- formats were not handled as well as they could be. This new
- program is based on two years' experience examining files
- from different writers with different word processors,
- talking to secretaries about how they set up different
- documents, and identifying the significant patterns of lines
- and characters that commonly occur. The program now
- analyzes each input file and makes very intelligent
- decisions about the type of file and the paragraphing style
- used.
-
- Version 1.1
-
- This was the first version to be widely distributed.
-
-
- Version 1.2
-
- 1. Renamed the former -L option to -B.
- 2. Changed the -T# option to use true tab stops.
- 2. Added new -L# option, as well as -H, -Y, and -R.
- 3. Expanded the file analysis stage to determine
- additional document characteristics, including
- typical line length, standard margin, recognition of
- unformatted, formatted, and print-formatted files,
- and header and footer locations.
- 4. Fixed a bug in the table-recognition section.
- 5. Additional fine-tuning of parameters and algorithms,
- particularly in regard to hanging indents, centered
- lines, and list items.
-
-
- Version 1.3
-
- (NOTE: If you used an earlier version and have not sent
- a contribution, please consider doing so now.)
-
-
-
-
- 19
-
-
-
-
-
- TEXTCON File Converter
-
-
- 1. Renamed many options. -B, -C, and -S became the B,
- R, and S sub-options of the new -K (for Keep)
- option. -R was split into the -F and -H options,
- both of which now accept a line number as a
- parameter. -D became -2, and -H became -X. I hated
- to change these so substantially, but it really
- seemed necessary. I don't think a major change will
- be necessary again.
- 2. Dropped one option. -W was no longer needed because
- of improvement in recognition of WordStar files.
- However, see the new -W option below.
- 3. Added new options. -1 option specifies single
- spacing. -B specifies that all paragraphs are block
- style. -M# specifies the minimum size of the left
- margin of the document. -W option specifies
- different processing of WordStar files. -Z#
- specifies that the original file has a particular
- character that always marks paragraph ends. -S#
- will split files with long lines into shorter lines.
- The C sub-option of the -K option specifies that
- control codes are to be kept in the new file.
- 4. Automatic removal of lines that are added only for
- print emphasis. In a file whose lines end in CR-LF
- pairs, these are easily recognized because they are
- preceded by a line without a line feed.
- 5. Additional improvement of decision rules and general
- fine tuning for better paragraph recognition.
-
-
- TEXTDCA Version 1.3
-
- Introducing a new program which will be sent to those
- contributing $25 or more for TEXTCON. It has two
- features not found in TEXTCON:
- 1. DCA/RFT output format. The -D option specifies that
- instead of an ASCII file, the output should be
- written in DCA/RFT format. Most of the major PC
- word processors now support this format, which
- unlike ASCII files, can contain formatting
- information such as margins, centering, tab
- settings, indents, etc. Now TEXTCON can pass all
- this information on to your word processor, saving a
- tremendous amount of reformatting.
- 2. Menu mode. TEXTDCA permits optional menu-driven
- selection of processing options, for those who have
- trouble with its normal command-line syntax. The
- menu system works only on IBM-PC-compatibles, not on
- MSDOS machines such as the Wang PC, DEC Rainbow, TI
- Professional, Tandy 2000, etc.
-
-
-
-
-
-
-
-
- 20
-
-
-
-
-
- TEXTCON File Converter
-
-
- DISTRIBUTION:
-
-
- The TEXTCON program described above is Copyright, 1986,
- Chris Wolf.
-
- TEXTCON accomplishes its purpose as described above and, if
- used carefully, will cause no known damage to a computer
- system or its files. All users should maintain backup
- copies of their own files and the author bears no responsi-
- bility for losses arising from their failure to do so.
-
- TEXTCON may be freely copied and distributed to others, but
- no one may charge a fee for such distribution, beyond a
- modest disk preparation charge. All copies of TEXTCON must
- be accompanied by this documentation file.
-
- I intend to support this program, continue to enhance it,
- and fix bugs in it. If you encounter problems with it, or
- have questions about it, I would like to hear from you. My
- Compuserve ID is 72446,2704.
-
- If you find that this program saves you time in your work
- and you use it regularly, please send me a contribution to
- help offset the time and resources I have spent developing
- and supporting it. If you are using it in an office
- environment, with multiple users on multiple computers,
- please consider this in determining the size of your
- contribution.
-
- Those who contribute $25 or more will be sent the TEXTDCA
- program, which is described elsewhere in this documentation.
-
- Chris Wolf
- 1521 Greenview Ave.
- East Lansing, MI 48823
- office phone - (517) 353-5017
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 21
-
-
-